The Final Report

- Introduction

Stashed in an unmarked bin of memorabilia, is my prized 1995 tie-dyed DARE tee-shirt, a paradoxical relic of elementary classroom hours spent discussing the dangers of peer-led drug use. Those fears were not unwarranted, the CDC (Center for Disease Control) reports that 136 Americans die daily from opioids and illicitly manufactured fentanyl¹, for me - a friend or three. In my experience, most drug prevention programs portray cannabis as the catalyst to the life doomed by addiction.

When recreational cannabis use became legal in Michigan, the culture shift was palpable. In an ironic twist of fate, the taboo habits of those once deemed as social deviants are now capitalized and funding K-12 education². I am fascinated by this turn of events, and I am curious if there are predictors from more tenured “green” states that reveal unintended ill-consequences.

In my data mining project, I explore the relationship between the presence of cannabis dispensaries and neighborhood crime rates. By replicating the methodology of Brinkman and Mok-Lamme’s 2019 study, “Not in my backyard? Not so fast. The effect of marijuana legalization on neighborhood crime” (NIMBY), I aim to contribute a nuanced perspective on this budding industry and how it shapes public safety and city planning.

My approach involved immersing myself in the wild world of crime and cannabis statistics in Colorado. Using the data sources outlined in NIMBY, I built the variables defined in NIMBY’s linear regression model, described in more detail in the methods section.

In line with the results in NIMBY, my findings indicate that there is no statistically significant relationship between an increased presence of cannabis dispensaries and crime.

- Methods

While all of the data referenced in NIMBY is publicly available, I found it necessary to deviate from the original paper’s methodology due to data quality issues and time constraints.

My diagram below illustrates the utilization of R code to construct a data pipeline for the linear model and its associated variables.

In the interest of interpretability, I divided my work into four distinct segments, each highlighted in the navigation bar of this site. I hope this structured breakdown allows readers to navigate through my process efficiently, providing transparent insight into my methods employed and limitations faced throughout this project.

In summation, the data mining pipeline consisted of several stages, each color coded above.

Dispensary license records were cleaned and pre-processed, removing irrelevant columns, standardizing the format, and aggregating the data by county and year. | code
Leveraging the power of the TidyCensus package, I extracted population data for the target variables from the American Community Survey (ACS) census database. | code
I wove the datasets together using county-level identifiers (GEOID), allowing for the computation of per capita values for dispensaries and crime rates and year-over-year fluctuations. | code

Finally, I introduced the new variables into a linear regression model. | code

The goal is to estimate the causal effect of changes in dispensary density on changes in crime rates, while accounting for potential biases and confounding factors using the linear regression model above.

Variable	Defined
Δcrime j,t	year-over-year changes in crime rates the jth geography in month t
Δdisp j,t	year-over-year changes in dispensary rates
j	neighborhood
t	time
𝛽0	(intercept) baseline level of year-over-year changes in crime rates
𝛽1	(coefficient) expected change in year-over-year crime rates associated with a one-unit increase
𝛽2	vector of estimated coefficients on the control variables (demographic characteristics, economic conditions)
X	a vector of control variables (demographic characteristics, economic conditions)
Δ𝛿t	time fixed effects
⋲ j,t	error term

- Results and Discussion

To gauge the differences between my work and NIMBY, I recreated a couple of the visual aids.

Comparing Dispensary Counts

Figure 1. From NIMBY:

**how to read:** “The panels of this figure show the total number of store fronts (calculated using data from Colorado Department of Revenue) cross tabbed by stores that sell only medical marijuana (green), both medical and recreational marijuana (red), and only recreational marijuana (blue). The vertical red line shows the first date (January 1, 2014) when recreational sales were legal.” (Brinkman Pg. 5)

interpretation: Once recreational cannabis was legalized in Colorado, we see many dispensaries begin selling both recreational and medical products. In 2015, recreational only stores begin their growth trajectory, while medical and both store fronts remain relatively flat.

Replicated Area Graph

comparison: Although I applied a different approach, my total counts are similar to the original study. My graph tells a comparable story as the plot above, but there are a few plots that look suspicious. With more time, I would investigate and validate the sharp dips seen in early 2015 and 2016.

Comparing Control Variables

Table 1. From NIMBY:

**how to read**: Panel B, shows the county quartiles (lowest to highest) by the four variables. The figures represent the three-year average of year-over-year growth in dispensary businesses. (Brinkman Pg. 6)

interpretation: the average county in Colorado saw a 11.9% increase in the number of dispensaries between 2014-2017. Counties that fall within the 4th quartile (highest) for poverty rates saw the largest increase (22.1%) average growth in the number of dispensaries, followed by the 4th quartile for Hispanic populations, and the 1st quartile (lowest) employed populations.

Replicated Table

how to read: figures represent the county quartiles (lowest to highest) by the four variables, showing represent three-year average of year-over-year growth in dispensary businesses.

comparison: now let’s look at my results, my calculations led me to a 11.5% three-year average growth in dispensaries, which is very close to NIMBY’s result. However, when I compare the individual demographics there is a lot of variance between the two results. The lowest quartile for employed populations revealed the most growth in dispensaries (+50.8%), followed by the 2nd quartile for black populations (27.0%).

The gaps between my employment quartiles and NIMBY’s are too wide to ignore. With more time, I’d explore the difference in methodologies and identify the contributing factors.

- Conclusions

Comparing the Dependent and Independent Variables:

Does dispensary density (green) influence crime rate (magenta)?

**how to read:** *the three-year average of crimes per 10,000 residents are plotted by Colorado county, overlaid by the three-year average county total dispensary count.*

interpretation: Denver is clearly where we see the larger crime rates, and more dispensaries. NIMBY refers to Denver as the “mecca of recreational cannabis.” El Paso is an interesting county with a relatively high dispensary density and low crime rate, as well as Pueblo County where the inverse is true.

From this map, at a glance, no clear relationship between dispensary density and crime rate is evident; but what does the math reveal?

Evaluating The Replicated Model

interpretation: this plot evaluates the accuracy of the replicated model in predicting the relationship between the growth of county crime rates and county dispensary density growth, while controlling for poverty rate, racial demographics, and employment rate.

The unequal distribution of plots around the horizontal line (y=0), indicates that my replicated model is not an ideal fit for the data.

evaluation: replicated model’s output

# import data
linear_variables <- read.csv("~/gvsu/winter 23/CIS 635/NIMBY/cleaning/linear_model/variables/linear_variable.csv")

model2 <- lm(linear_variables$crime_yoy_avg ~ linear_variables$dsp_yoy_avg +
               linear_variables$pct_poverty_rate  +
               linear_variables$pct_black  +
               linear_variables$pct_hispanic  +
               linear_variables$employeed_pop,
                data = linear_variables)
summary(model2)


Call:
lm(formula = linear_variables$crime_yoy_avg ~ linear_variables$dsp_yoy_avg + 
    linear_variables$pct_poverty_rate + linear_variables$pct_black + 
    linear_variables$pct_hispanic + linear_variables$employeed_pop, 
    data = linear_variables)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47400 -0.13238 -0.04318  0.08269  1.00885 

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)
(Intercept)                        8.627e-02  1.145e-01   0.753    0.454
linear_variables$dsp_yoy_avg      -6.323e-02  1.543e-01  -0.410    0.683
linear_variables$pct_poverty_rate  6.074e-01  9.582e-01   0.634    0.529
linear_variables$pct_black        -2.071e-01  1.514e+00  -0.137    0.892
linear_variables$pct_hispanic     -1.692e-01  3.680e-01  -0.460    0.647
linear_variables$employeed_pop    -2.045e-07  4.720e-07  -0.433    0.666

Residual standard error: 0.2976 on 58 degrees of freedom
Multiple R-squared:  0.01903,   Adjusted R-squared:  -0.06554 
F-statistic: 0.225 on 5 and 58 DF,  p-value: 0.9502

Reviewing the summary output, there are several indicators that my model is a poor fit for the data:

the negative adjust r-square
low multiple R square, reveals that dispensary density only explains about 1.9% of the variation in crime rates
the residuals skew left

Additionally, the high p-value (0.95) indicates that there is insufficient evidence to reject the null hypothesis (Ho: dispensary presence has no effect on crime rates).

Results Comparison:

This model is part of NIMBY’s baseline estimation strategy. Their next model iteration controls for the “miles from the border.” It is from this final set of variables they draw their conclusions. Additionally, while most of the insights in NIMBY look at the census tract level, the authors discuss the county effects:

“The OLS estimates show no significant correlation between changes in dispensary density and changes in crime at the county level. The IV estimates are consistent with the tract-level results, showing some evidence of reduced crime in counties that received more dispensaries.

The results, however, are weaker both economically and statistically, which is to be expected given that we have shown that the effects tend to be contained at the neighborhood level where the dispensaries are located.”(Brinkman and Mok-Lamme 2019)

Final Thoughts:

Tom Khabaza’s Third Law of Data Mining states that “Data preparation is more than half of every data mining process.” This maxim was my guiding principle as I navigated a labyrinth of data quality challenges. Although I reproduced many of the macro themes from NIMBY, the intricate details unveiled my code requires further refinement and reassessment of my methodology.

Predicting crime rates is no straightforward task. Even with careful data construction, an accurate predictive model requires trial and error, domain knowledge, and a stamina for problem-solving. After this experience, I am more inclined to trust the results of meta-analyses of existing studies than a single investigation.

To holistically understand the long-term implications of legal recreational cannabis use, a multifaceted perspective is necessary. If I were to embark on a similar study in Michigan, rather than refine this model, I would radically reimagine my approach. Crime rates are swayed by a kaleidoscope of factors, but the cannabis industry remains an unproven culprit. Scrutinizing the allocation of cannabis-generated state revenue, and extracting the measureable benefits for local municipalities presents a more captivating challenge to explore.

- Data and Software

Data Sources:

Colorado Department of Revenue: Medical and Retail Marijuana Licencees
City and County of Denver Surveyor’s Office: Denver Geographic Boundaries
FBI Uniform Crime Reporting Program: Colorado
2014 American Community Survey: Colorado Population, Race, Ethnicity, and Poverty Rates

Software:

Rstudio
- Packages:
  - cowplot (Wilke 2020)
  - fs (Hester, Wickham, and Csárdi 2021)
  - lubridate (Grolemund and Wickham 2011)
  - maps (Becker, Minka, and Deckmyn. 2022)
  - openxlsx (Schauberger and Walker 2023)
  - readx (Wickham and Bryan 2023)
  - sf (Pebesma 2018)
  - tidycensus (Walker and Herman 2023)
  - tidygeocoder (Cambon et al. 2021)
  - tidyverse (Wickham et al. 2019)
  - tigris (Walker 2023)

- Works Cited

Brinkman, J., & Mok-Lamme, D. (2019). Not in my backyard? Not so fast. The effect of marijuana legalization on neighborhood crime. Regional Science and Urban Economics, 78. https://doi-org.ezproxy.gvsu.edu/10.1016/j.regsciurbeco.2019.103460

Anderson, D. M., & Rees, D. I. (2023). The Public Health Effects of Legalizing Marijuana. Journal of Economic Literature, 61(1), 86–143. https://doi-org.ezproxy.gvsu.edu/10.1257/jel.20211635

Hughes, Lorine A., Lonnie M. Schaible, and Katherine Jimmerson. Marijuana Dispensaries and Neighborhood Crime and Disorder in Denver, Colorado. Justice Quarterly, vol. 37, no. 3, 2020, pp. 461-485, doi: 10.1080/07418825.2019.1567807.

References

Becker, Original S code by Richard A., Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka, and Alex Deckmyn. 2022. “Maps: Draw Geographical Maps.” https://CRAN.R-project.org/package=maps.

Brinkman, Jeffrey, and David Mok-Lamme. 2019. “Not in My Backyard? Not so Fast. The Effect of Marijuana Legalization on Neighborhood Crime.” Regional Science and Urban Economics 78 (September): 103460. https://doi.org/10.1016/j.regsciurbeco.2019.103460.

Cambon, Jesse, Diego Hernangómez, Christopher Belanger, and Daniel Possenriede. 2021. “Tidygeocoder: An r Package for Geocoding” 6: 3544. https://doi.org/10.21105/joss.03544.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with Lubridate” 40. https://www.jstatsoft.org/v40/i03/.

Hester, Jim, Hadley Wickham, and Gábor Csárdi. 2021. “Fs: Cross-Platform File System Operations Based on ’Libuv’.” https://CRAN.R-project.org/package=fs.

Pebesma, Edzer. 2018. “Simple Features for r: Standardized Support for Spatial Vector Data” 10. https://doi.org/10.32614/RJ-2018-009.

Schauberger, Philipp, and Alexander Walker. 2023. “Openxlsx: Read, Write and Edit Xlsx Files.” https://CRAN.R-project.org/package=openxlsx.

Walker, Kyle. 2023. “Tigris: Load Census TIGER/Line Shapefiles.” https://CRAN.R-project.org/package=tigris.

Walker, Kyle, and Matt Herman. 2023. “Tidycensus: Load US Census Boundary and Attribute Data as ’Tidyverse’ and ’Sf’-Ready Data Frames.” https://CRAN.R-project.org/package=tidycensus.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, and Jennifer Bryan. 2023. “Readxl: Read Excel Files.” https://CRAN.R-project.org/package=readxl.

Wilke, Claus O. 2020. “Cowplot: Streamlined Plot Theme and Plot Annotations for ’Ggplot2’.” https://CRAN.R-project.org/package=cowplot.

Footnotes

Drug Abuse Statistics. “Drug Overdose Deaths.” Drug Abuse Statistics, 2022, https://drugabusestatistics.org/drug-overdose-deaths, Accessed 12 Apr. 2023.↩︎
Michigan Department of Treasury. “Adult-Use Marijuana Payments Being Distributed to Michigan Municipalities and Counties.” Michigan.gov, 28 Feb. 2023, https://www.michigan.gov/treasury/news/2023/02/28/adult-use-marijuana-payments-being–distributed-to-michigan-municipalities-and-counties, , Accessed 12 Apr. 2023.↩︎